Efficient Tree-Based Topic Modeling
نویسندگان
چکیده
Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However, its expressive power comes at the cost of more complicated inference. We extend the SPARSELDA (Yao et al., 2009) inference scheme for latent Dirichlet allocation (LDA) to tree-based topic models. This sampling scheme computes the exact conditional distribution for Gibbs sampling much more quickly than enumerating all possible latent variable assignments. We further improve performance by iteratively refining the sampling distribution only when needed. Experiments show that the proposed techniques dramatically improve the computation time.
منابع مشابه
Efficient Methods for Inferring Large Sparse Topic Hierarchies
Latent variable topic models such as Latent Dirichlet Allocation (LDA) can discover topics from text in an unsupervised fashion. However, scaling the models up to the many distinct topics exhibited in modern corpora is challenging. “Flat” topic models like LDA have difficulty modeling sparsely expressed topics, and richer hierarchical models become computationally intractable as the number of t...
متن کاملDC-NMF: nonnegative matrix factorization based on divide-and-conquer for fast clustering and topic modeling
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide-and-conquer strategy, called DC-NM...
متن کاملPresenting a Model for Predicting Tax Evasion of Guilds Based on Data Mining Technique
In this research, considering the importance of the topic and the gap in previous researches, a model for predicting tax evasion of guilds based on data mining technique is presented. The analyzed data includes the review of 5600 tax files of all trades with tax codes in Qazvin province during the years 2013-2018. The tax file related to guilds is in five tax groups, including the guild group o...
متن کاملCapturing Term Dependencies using a Sentence Tree based Language Model
We describe a new probabilistic Sentence Tree Language Modeling approach that captures term dependency patterns in Topic Detection and Tracking’s (TDT) Story Link Detection task. New features of the approach include modeling the syntactic structure of sentences in documents by a sentence-bin approach and a computationally efficient algorithm for capturing the most significant sentence level ter...
متن کاملWSSE: A Web Service Search Engine for Large Scale Web Service Discovery based on the Probabilistic Topic Modeling and Clustering
With the ever increasing number of web services, discovering the appropriate web service requested by users has become a vital yet challenging task. The aim of this project is to provide an efficient search engine that can retrieve the most relevant web services in a short time. The proposed search engine WSSE is based on the probabilistic topic modeling and clustering techniques that are integ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012